Module 01

Module 01 portfolio check

  • Evidence worksheet_01
    • Completion status: X
    • Comments:
  • Evidence worksheet_02
    • Completion status: X
    • Comments:
  • Evidence worksheet_03
    • Completion status: X
    • Comments:
  • Problem Set_01
    • Completion status: X
    • Comments:
  • Problem Set_02
    • Completion status: X
    • Comments:
  • Writing assessment_01
    • Completion status:
    • Comments: Missing Writing assessment_01 –> See posted screenshots of the essay.
  • Additional Readings
    • Completion status:
    • Comments: Need links –> Links are now added.

Data Science

  • Installation check
    • Completion status: X
    • Comments:
  • Portfolio repo setup
    • Completion status: X
    • Comments:
  • RMarkdown Pretty PDF Challenge
    • Completion status: X
    • Comments:
  • ggplot
    • Completion status: 9/10
    • Comments: Exercise 3 should be at a different taxonomic level than the example plot.

Installation check

Portfolio repo setup

git status - is to check is my local repo is updated with the master repo git fetch then git pull - to pull the files from master repo to local repo git add . - to place file to the staging message git commit -m “Commit message” - to commit a file for addtion and to include a message along with it. git push - to push the file to the master repo

R Markdown pretty PDF challenge

The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.

http://phdcomics.com/ Comic posted 1-17-2018

Challenge Goals

The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)

hint: go to the PhD Comics website to see if you can find the image above
If you can’t find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown

Here’s a header!

Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).

Another header, now with maths

Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotton how to add. Never fear! A calculator R is here:

1231521+1234155628098
## [1] 1.234157e+12

Table Time

Or maybe, after you’ve added those numbers, you feel like it’s about time for a table! I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in the future.

library(knitr)
kable(summary(cars),caption="I made this table with kable in the knitr package library")
I made this table with kable in the knitr package library
speed dist
Min. : 4.0 Min. : 2.00
1st Qu.:12.0 1st Qu.: 26.00
Median :15.0 Median : 36.00
Mean :15.4 Mean : 42.98
3rd Qu.:19.0 3rd Qu.: 56.00
Max. :25.0 Max. :120.00

And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh? Here’s ours! Include a fun gif of your choice!

tidyverse

#Load Library
library("tidyverse")
## ── Attaching packages ───────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
## ✔ tibble  1.4.2     ✔ dplyr   0.7.4
## ✔ tidyr   0.8.0     ✔ stringr 1.3.0
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
#Exercise 1
metadata = read.table(file="Saanich.metadata.txt", header=TRUE, row.names = 1, sep="\t", na.strings="NAN")
OTU = read.table(file="Saanich.OTU.txt", header=TRUE, row.names = 1, sep="\t", na.strings="NAN")

#Exercise 2

metadata %>% rownames_to_column('sample') %>% 
  filter(CH4_nM >=100, Temperature_C<=10) %>%
  column_to_rownames('sample') %>% 
  select(Depth_m,CH4_nM,Temperature_C)
#Exercise 3

nM_to_uM_Metadata_coversion <-metadata %>% rownames_to_column('sample') %>% 
  select(matches("nM"), matches('sample')) %>% 
  mutate(N2O_uM = N2O_nM/1000, Std_N2O_uM = Std_N2O_nM/1000, CH4_uM = CH4_nM/1000, Std_CH4_uM = Std_CH4_nM/1000) %>% 
  column_to_rownames('sample')

#For Exercise 3: All variables that are in nM to μM. The output table titled: nM_to_uM_Metadata_coversion shows only the original nM and converted μμM variables.

ggplot

library("tidyverse")

source("https://bioconductor.org/biocLite.R")
## Bioconductor version 3.6 (BiocInstaller 1.28.0), ?biocLite for help
biocLite("phyloseq")
## BioC_mirror: https://bioconductor.org
## Using Bioconductor 3.6 (BiocInstaller 1.28.0), R 3.4.3 (2017-11-30).
## Installing package(s) 'phyloseq'
## 
## The downloaded binary packages are in
##  /var/folders/z3/h3tm9hss3bx6zbhrqjq0f5y80000gn/T//RtmpfPClb3/downloaded_packages
## Old packages: 'ade4', 'ape', 'bindr', 'bindrcpp', 'broom', 'callr',
##   'cluster', 'curl', 'foreign', 'glmnet', 'igraph', 'kableExtra',
##   'lubridate', 'Matrix', 'nlme', 'plogr', 'psych', 'Rcpp', 'readxl',
##   'selectr', 'stringi', 'survival', 'tinytex', 'vegan', 'withr'
library("phyloseq")

load("phyloseq_object.RData")

#Exercise 1

ggplot(metadata, aes(x=PO4_uM, y=Depth_m)) + 
  geom_point(color="purple", shape=17)

#Exercise 2
metadata %>% 
  mutate(Temperature_F= Temperature_C*9/5+32) %>% 
  ggplot() + geom_point(aes(x=Temperature_F, y=Depth_m))

#gglot with phyloseq
plot_bar(physeq, fill="Phylum")

physeq_percent = transform_sample_counts(physeq, function(x) 100 * x/sum(x))
plot_bar(physeq_percent, fill="Phylum")

plot_bar(physeq_percent, fill="Phylum") + 
  geom_bar(aes(fill=Phylum), stat="identity")

#Exercise 3
plot_bar(physeq_percent, fill="Phylum", title = "Phyla from 10 to 200 in Saanich Inlet") +
  geom_bar(aes(fill=Phylum), stat="identity") + 
  labs(x="Sample depth", y="Percent relative abundance")

#Faceting
plot_bar(physeq_percent, fill="Phylum") +
  geom_bar(aes(fill=Phylum), stat="identity") +
  facet_wrap(~Phylum)

plot_bar(physeq_percent, fill="Phylum") +
  geom_bar(aes(fill=Phylum), stat="identity") +
  facet_wrap(~Phylum, scales="free_y") +
  theme(legend.position="none")

#Exercise 4

plot_nutrients= metadata %>% 
  select(Depth_m, NH4_uM,NO2_uM, NO3_uM, O2_uM, PO4_uM, SiO2_uM) %>% 
  gather(Nutrients, Concentration, NH4_uM,NO2_uM, NO3_uM, O2_uM, PO4_uM, SiO2_uM)

ggplot(plot_nutrients, aes(x=Depth_m, y=Concentration)) +
  geom_point() + geom_line() +facet_wrap(~Nutrients, scales="free_y") +
  theme(legend.position = "none") 

Origins and Earth Systems

Evidence worksheet 01 “Prokaryotes: The unseen majority”

Whitman et al 1998

Learning objectives

Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.

General questions

  • What were the main questions being asked?

    • What is the estimate number of prokaryotes on earth, epecifically in seawater, soil, and the sediment/soil subsurface?
    • How much carbon derived from prokaryotes from the total carbon on Earth?
  • What were the primary methodological approaches used?

    • Three largest habitats were used to estimate the total number and total carbon of prokaryotes on earth:
    1. For the aquatic environments, the volumes of oceanic water, freshwater/ saline lakes, polar regions and the corresponding average cellular densities were multiplied to calculate the number of cells for that region. For the polar region, in particular, the estimated number of prokaryotes by Delille & Rosiers and the mean area extent of seasonal ice were also used in the calculation.
    2. For the soil, the authors conducted detailed direct counts from a coniferous forest utisol as it was generally considered representative of forest soil. 
    3. For the subsurface, the first approach is based on the assumption of the percentage of the  average porosity of the terrestrial subsurface (3%) and the total pore space occupied by prokaryotes (0.016%). The other approach involved using the estimated of number of prokaryotes in various groundwater sites multiplied to total volume of ground water in the earth surface.
  • Summarize the main results or findings.

    • Based on oceanic, soil, and subsurface habitats, the estimated total number of prokaryotes is 4 to 6 x \(10^{30}\) cells
    • 350-550 pg C of the total amount of C on Earth are estimated for the prokaryotes
    • prokaryotic carbon pool is ~ 60 to 100% of the total carbon found in plants globally
    • Prokaryotes contain ~ 85 to 130 Pg of N & 9 to 14 Pg of P which 10-fold more than plants
    • Most prokaryotes found in ocean, soil, and oceanic & terrestrial subsrface habitats
      • 1.2 x \(10^{29}\) cells in open ocean
      • 2.6 x \(10^{29}\) cells in soil
      • 3.5 x \(10^{29}\) cells in oceanic subsurface
      • 0.25 to 2.5 x \(10^{30}\) cells in terrestrial subsurface
    • The average prokaryotic turnover times are:
      • 20 0m upper ocean: 6 to 25 days
      • Ocean below 200 m: 0.8 years
      • Soil: 2.5 years
      • Subsurface: 1.2 x \(10^{3}\) years
    • Cellular production rate is ~ 1.7 x \(10^{30}\) cells/year (highest in open ocean)
    • The abundance of prokaryotes offers an enormous capacity for genetic diversity
  • Do new questions arise from the results?

    • There were many assumptions made in this study. Thus, how accurate were the numbers?
    • The prokaryotes’ biomass is rich in nitrogen and phosphorus and in fact greater than of plants by an order of magnitude. This is an indicative of the significat role prokaryotes play in C, N, and P nutrient cycles, globally. Are there other global events or factors that prokayotes are involved in?
    • Microbes have high mutation rate and this could affect its turnover rates and consequently the cycles of nutrients such as C, N, and P. Up to what extent do prokaryotes play a role in the total metabolic potential of the earth’s ecosystems?
    • How diverse are the prokaryotes in each of habitats?
  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

    • There were a lot of number presented in the study and it would be really helpful if the authors provided more information regarding how they performed various calculations.
    • Using multiple estimated data also increases the errors associated with it, which decreases the statistical confidence with the results

Evidence worksheet 02 “Life and the Evolution of Earth’s Atmosphere”

Kasting, JF, Siefert, JL. 2002. Life and the Evolution of Earth’s Atmosphere. Science. 296:1066-1068. doi: 10.1126/science.1071184. Link

Learning objectives:

Comment on the emergence of microbial life and the evolution of Earth systems

  • Indicate the key events in the evolution of Earth systems at each approximate moment in the time series. If times need to be adjusted or added to the timeline to fully account for the development of Earth systems, please do so.

    • 4.6 billion years ago
      • Solar system formed. Inner planets received water vapour and carbon. At this time, there was high carbon dioxide concetration, vapour pressure and temprature was at 500 degree Celcius.
    • 4.5 billion years ago
      • Moon was formed which allowed the earth to spin, tils, have day/night cycles and different seasons.
    • 4.4 billion years ago
      • Zircon (the oldest minirals) formed
      • Earth had decreased in temperature at 100 degree Celcius
    • Between 4.4 and 4.1 billion years ago
      • A meteor impact
    • 4.1 billion years ago
      • Evidence of life found in Zircons
    • 4.0 billion years ago
      • The oldest rock: Acastagneiss
      • There was evidence of plate subduction
      • Greenhouse carbon dioxide increased
      • Metorite bombardment halted and sea water chemistry stablized.
    • 3.8 billion years ago
      • Chemical fossils such as carbon isotypes found in rocks which provides another evidence for early life. Use of C-12 suggests possibiity for phosynthesis. However, non-photosysnthetic autotrophs can also produce C-12.
    • 3.5 billion years ago
      • Structural fossils such presence of stromatolites (bacterial aggregations) found in rocks.
      • Early methanogensis.
    • 3.0 billion years ago
      • A glaciation occured.
      • Early cyanobacteria, evidence for photosynthesis
      • Great oxidations event
      • life on land
    • 2.7 billion years ago
      • Emergence of prokaryotes
    • 2.2 billion years ago
      • rock recognized as red beds
      • Oxygen levels increased sharply
    • 2.1 billion years ago
    • 1.8 billion years ago
    • 1.3 billion years ago

    • 550,000 years ago
      • Cambrian explosion, expansion of multicellular evolution
      • Denovian explosion, emergence of woody land plants
      • Caboniferous period, presence of fish cephalopds, corals
      • Formation of Pange resulted in dry, harsh climate in Pangea’s interior and also there increased competition among species as they were being clusted in one giant land mass.
      • Permian extrintion, 95% of species gone
    • 400,000 billion years ago
      • Increased in oxygen levels again, resulted in rise of giants
      • Hence, rise of dinosaurs
      • Cretaceous-Tertiary extinction event, which resulted in nothing over 10kg of an organims existed.
      • Increased in mammal size, and diversification
      • Dramatic global warming
      • Ice age event
      • Grass started to dominate the forest
    • 200,000 years ago
      • Homosapiens first appeared
  • Describe the dominant physical and chemical characteristics of Earth systems at the following waypoints:

    • Hadean
      • With the temeprature of 500 degree Celcius and high levels of carbon diaxide and water vapous, the Earth is practically a molten object.
    • Archean
      • A glaciation occured.
      • Soon after the planet became brown and hazy due methanogenesis. Methane produce by methanogens help keep Earth warm. Otherwise, the Earth would have stayed frozen and perhaps no life would have existed.
    • Precambrian
      • Phosynthesis evolves, result ing some oxygen in the atmosphere.
    • Proterozoic
    • During early proterozoic, another glaciation event occured. Once again, the Earth system has “shut down”.
      • Oxygen and atmospheric methane = carbon dioxide. This caused a net decrease in greenhouse gas effects, making earth cold and leading to glaciation.
    • The oxygen also started oxidizing iron forming banded irons, as seen in sedimentary rock.

    • Phanerozoic
    • There was an increased oxygenation of the atmosphere
    • Plants started to evolve
    • Coal deposits from dead organisms caused by multiple extiction events were stored in sediments
    • Once again, glaciation occured at various periods

Evidence worksheet 03 “The Anthropocene”

Waters, CN, Zalasiewicz, J, Summerhayes, C, Barnosky, AD, Poirier, C, Galuszka, A, Cearreta, A, Edgeworth, M, Ellis, EC, Ellis, M, Jeandel, C, Leinfelder, R, McNeill, JR, Richter, DD, Steffen, W, Syvitski, J, Vidas, D, Wagreich, M, Williams, M, An, ZS, Grinevald, J, Odada, E, Oreskes, N, Wolfe, AP. 2016. The Anthropocene is functionally and stratigraphically distinct from the Holocene. Science. 351:137. doi: 10.1126/science.aad2622. Link

Learning objectives

  • Evaluate human impacts on the ecology and biogeochemistry of Earth systems.

General questions

  • What were the main questions being asked?

    • “Have humans changed the Earth system to such an extent that recent and currently forming geological deposits include a signature that is distinct from those of the Holocene and earlier epochs, which will remain in the geological record? If so, when did this stratigraphic signal (not necessarily the first detectable anthropogenic change) become recognizable worldwide?”
    • Should we officialy formalize Anthropocene or even its early events, considering the vast changes humans have done to the Earth system?
  • What were the primary methodological approaches used?

    • Reviewing of several research evidences suggesting that the Anthropocene’s stratigraphic signatures distinguish it from the Holocene. This includes measurements of:
    • Novel markers, such as concrete, plastics, global black carbon, and plutonium (Pu) fallout, shown with radiocarbon ( 14 C) concentration
    • Long-ranging signals such as nitrates (NO 3 – ), CO 2 , CH 4 , and global temperatures
  • Summarize the main results or findings.

    • Signs of Anthropocene:
    • modification of carbon, nitrogen, and phosphorus cycles
    • exceeding rate of see-level rise and pertubation of climate system
    • global spikes in fallout radionuclides and particulates from fossil fuel combustion
    • Lake sediments currently shows signs of Anthropocene that truly differ from Holocene signatures, as these are:
      • unprecedented combinations of plastics, fly ash, radionuclides, metals, pesticides, reactive nitrogen, and consequences of increasing greenhouse gas concentrations.
      • glacier retreat due to climate warming has resulted in an abrupt stratigraphic transition from proglacial sediments to nonglacial organic matter
    • species invasions worldwide and accelerating rates of extinction
  • Do new questions arise from the results?

    • When did the Anthropocene exactly begin? Is it possible to determine a specific year given the data we have?
    • What are the benefits and/or disadvantages of of officially recognizing Anthropocene?
    • Is Anthropocene “bad” for humanity?
  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

    • The photo the lake sediment containing a glacial retreat due to climate warning is a powerful image that shows a “snapshot” for the onset of Anthropocene
    • Combining all the current evidences available, the authors were able to match the criteria for definining Quaternary stratigraphic units with Anthropocene, which set the stage to formalize Anthropocene.
    • Providing signals and suggestions how how they may be used in the stratigraphic characterization and correlation of Anthropocene epoch furthered the author’s claim.

Problem set 01 “Prokaryotes: The unseen majority”

Whitman et al 1998

Learning objectives:

Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.

Specific questions:

  • What are the primary prokaryotic habitats on Earth and how do they vary with respect to their capacity to support life? Provide a breakdown of total cell abundance for each primary habitat from the tables provided in the text.

    • 1.2 x \(10^{29}\) cells in open ocean
    • 2.6 x \(10^{29}\) cells in soil
    • 3.5 x \(10^{29}\) cells in oceanic subsurface
    • 0.25 to 2.5 x \(10^{30}\) cells in terrestrial subsurface
  • What is the estimated prokaryotic cell abundance in the upper 200 m of the ocean and what fraction of this biomass is represented by marine cyanobacterium including Prochlorococcus? What is the significance of this ratio with respect to carbon cycling in the ocean and the atmospheric composition of the Earth?

    • 3.6 x \(10^{28}\) cells in the upper 200 m of the ocean
    • 2.9 x \(10^{27}\) cells are autotrophs
    • 8% of the biomass in the upper 200m of the ocean is represented by marine autotrophs including cyanobacterium/Prochlorococcus
    • 8% of the autotrophs are responsible for the amount of carbon cycled through the Earth’s oceans, which ultimately support carbon availability for the rest of the 92% heterotrophs
  • What is the difference between an autotroph, heterotroph, and a lithotroph based on information provided in the text?

    • An autotroph uses inorganic chemicals (i.e. carbon dioxide) as carbon source, while heterotroph assimilate organic carbon sources (2). Autotrophs are also self-nourshing and capable of fixing inorganic carbon dioxide into biomass.
    • A lithotroph uses an inorganic chemical as electron source (2).
  • Based on information provided in the text and your knowledge of geography what is the deepest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this depth?

    • The text provides that there are prokaryotes up to 4 km deep in the subsurface.
    • Taking into account the deepest point in the ocean which is the Mariana’s Trench (10.9km deep), life could potentially exist at 14.9km deep.
    • The limiting factor at these depths is the temperature. At 4km deep into the substrate the temperature is ~ 125 degrees celsius.
  • Based on information provided in the text your knowledge of geography what is the highest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this height?

    • From the text, it was suggested that the atmosphere at 77 km is the upper bound from the highest point of the earth surface that is capable of supporting prokaryotic life. The highest habitat on Earth is Mount Everest which is approximately 8.8km above sea level. Some factors that can limit survival of airborne prokaryotes are nutrient availability, moisture (desiccant conditions) and UV radiation (3).
  • Based on estimates of prokaryotic habitat limitation, what is the vertical distance of the Earth’s biosphere measured in km?

    • From the top of Mount Everest (8.8km high) to the bottom of Mariana’s Trench (10.9km deep + 4km deeper into the sediment), there is about 24 km vertical distance where prokaryotes presumably can live.
  • How was annual cellular production of prokaryotes described in Table 7 column four determined? (Provide an example of the calculation)

    • population size * # of turn over per year (years) = cell per year

    • Example: Using the data for marine heterotrophs:3.6 x 10^{28} * 365 day /15 turnovers = 8.2 x 10^{29} cells/ year

  • What is the relationship between carbon content, carbon assimilation efficiency and turnover rates in the upper 200m of the ocean? Why does this vary with depth in the ocean and between terrestrial and marine habitats?

    • According to the text, the carbon efficiency is approximated to be 20%.
    • Assumption: 20 fg of C per prokaryotic cell which is is about 20^{-30} petagrams
    • Amount of carbon in marine heterotrophs (pg/cell) = 3.6 x 10^28 cells x 20^{-30} petagrams of C/cell = 0.72 petagrams/ cell
    • With 20% loss and 80% approx. retained, there is 4 x 0.72 = 2.88 petagrams of C/year for marine heterotrophs

    • 51 petagrams of C/ year * 85% of that is consumed in photic waters = 43 petagrams of C/ year is consumed
    • 43 petagrams C consumed /year / 2.88 petagrams C assimilated/year = 14.9 or 1 turnover every 24.5 days

    • The variation of carbon assimilation with depths are primarily due to the different carbon production and composition of microoganism found in that particular habitat.

  • How were the frequency numbers for four simultaneous mutations in shared genes determined for marine heterotrophs and marine autotrophs given an average mutation rate of 4 x 10-7 per DNA replication? (Provide an example of the calculation with units. Hint: cell and generation cancel out)

    • [4.7x10^{-7} mutations/generation]^{4}= 2.56 x 10^{-26} mutations/generation
    • generations per year in marine habitats? 3.6 x 10^{28} cells
    • 365 days/ 16 days = 22.8 turnovers/year

    • 3.6 x 10^{28} cells x 22.8 turnovers/year = 8.2 x 10{^29} cells/year

    • 8.2 x 10{^29} cells/year x 2.56 x 10^{-26} mutations/generation = 2.1x 10{^4} mutations/year

    • 2.1 x 104 mutations/year is about 0.4 mutations/hour, as stated in the paper (1).

    • With a fast turnover rate and a big population size, these numbers are possible with respect to microbial population

  • Given the large population size and high mutation rate of prokaryotic cells, what are the implications with respect to genetic diversity and adaptive potential? Are point mutations the only way in which microbial genomes diversify and adapt?

    • With high mutation rates in such a large population size, prokaryotic cells can have high diversity and if mutations are beneficial, this increases microbes’ adaptive potential.
    • Through horizonal gene transfer, microbial genomes can also have diversity and adapt.
  • What relationships can be inferred between prokaryotic abundance, diversity, and metabolic potential based on the information provided in the text?

    • With high abundance, rapid replication and high mutation rates, there is a possibility for an increase in diversity in the population. With increased microbial diversity, consequently there would be increased in metabolic potential in response to selective pressure and stress.

Problem set 02 “Microbial Engines”

Falkowski, PG, Fenchel, T, Delong, EF. 2008. The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science. 320:1034-1039. doi: 10.1126/science.1153213. Link

Learning objectives:

Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.

Specific Questions:

  • What are the primary geophysical and biogeochemical processes that create and sustain conditions for life on Earth? How do abiotic versus biotic processes vary with respect to matter and energy transformation and how are they interconnected?

    • The primary geophysical processes are the tectonics and atmospheric photochemical processes (i.e. erosion and geothermal activity) which continuously supply substrates and remove products on earth. These processes allow interactions of elements and molecules and cycles of chemical bond formation and cleavage that make planetary chemistry ultimately at thermodynamic equilibrium.

    • The primary biogeochemical processes involves the cycle of the six major elements—H, C, N, O, S, and P. The first 5 elements in the list is hugely driven by microbes through thermodynamically constrained redox reactions. In addition, volcanism and rock weathering contribute to the nutrient cycling on earth. These events resupply C, S, and P.

    • Abiotic processes are based on acid/base chemistry (i.e., transfers of protons without electrons) while biotic processes are based by redox reactions (i.e., successive transfers of electrons and protons from a relatively limited set of chemical elements). These processes are interconnected in which they have laid down the lower limits on the external energy needed to sustain the biogeochemical cycles on earth.

  • Why is Earth’s redox state considered an emergent property?

    • It is because the resulting redox condition of the ocean and atmosphere is far from the thermodynamic equilibrium, yet is fairly stable on geological time scales which is due to feedbacks between the evolution of microbial metabolic and geochemical processes.
  • How do reversible electron transfer reactions give rise to element and nutrient cycles at different ecological scales? What strategies do microbes use to overcome thermodynamic barriers to reversible electron flow?

    • Following the Le Chatelier’s principle, depending on the available substrates vs products present, reactions can be driven in forward or reverse direction to achieve a thermodynamically favorable state. In the case with methanogenic Archaea the form methane from carbon dioxide and hydrogen, if hydrogen is suffienctly low, the reverse process occurrs. Microbes overcome thermodynamic barriers to reversible electron flow through close spatial association which allows synergistic cooperation with various microbial members. Back to the example earlier, to drive the reverse reaction provided by the methanogenic Archaea it would be in close proximity with hydrogen-consuming sulfate reducers, which would result in low hydrogen tension.
  • Using information provided in the text, describe how the nitrogen cycle partitions between different redox “niches” and microbial groups. Is there a relationship between the nitrogen cycle and climate change?

    • We can look at the nitrogen cycle at different points but for the purpose of explaining a cycle, we can start at the nitrogen fixation step as the starting reference point. Nitrogen fixation converts N2 to NH4 which allows for N2 to become accessible for synthesis of proteins and nucleic acids in organisms.

    • In the presence of O2, NH4 is first oxidized to nitrite (NO2) by a specific group of bacteria or archaea, which then is oxidized to nitrate (NO3) by a different set of nitrifying bacteria. The redox potential involved in oxidation is used by nitrifiers to reduce CO2 into organic matter.

    • In the absence of O2, a differnt group of microbes may use NO2 and NO3 as electron acceptors in anaerobic oxidation to evetually produce N2. This closes the N-cycle.

    • Quoted from the text, the N-cycle “forms an interdependent electron pool that is influenced by photosynthetic production of oxygen and the availability of organic matter”. Sunlight availability is affected by climated change. This, in turn, affect the N-cycle whereby photosynthetic organisms that require nitrogen oxides as terminal electron acceptors are involved. On the other hand, N-cycle can also haven a postive impact to climate change. Presence of nitrifying organisms, NH4 or NO2 can be use to reduce CO2 into organic matter, which then may minimize the green house effect by decreasing CO2 levels.

  • What is the relationship between microbial diversity and metabolic diversity and how does this relate to the discovery of new protein families from microbial community genomes?

    • Different redox and oxidations reations that give rise to a certain element or molecule are partitioned in different microbial groups as part of their metabolic pathway. As an example, methanogenic Archaea uses carbon dioxide and hydrogen to form methane. This is unique to hydrogen-consuming sulfate reducers which uses hydrogen. With their differert metabolic functions, each microbial groups can have a specialized role in a given community. With more microbial groupings specizlized in different methabolic pathways, we would expect a higher microbial diversity.

    • Microbes are able to transfer genes to other microbes via horizontal gene transfer. The transferred genes are retained due to presence of selective pressures in the environment. The transferred genes may encode a part of or an entire methabolic pathway. Following the central dogma of biology, with new transferred genes, new proteins are expected to be translated.

    • With some parts of the metabolic pathway distributed to some microbial groups, there would be an increase in microbial diversity within a given environment. Under specific conditions, different microbes will need to transcibe and translate particular genes and produce proteins required for its survival. Thus, this relate to discovery of new protein families from microbial community genomes, whereby these environment-specific genes in a given microorganism are turned by a particular habitat.

  • On what basis do the authors consider microbes the guardians of metabolism?

    • Metabolic pathways stored in genes can be transfered and also be retained in microbes under environmental selection. Thus, as quoted from the paper, microbes can be seen as vessels that ferry metabolic machines through strong environmental perturbations and long geological landscapes. Individual members of the microbial community may become extinct but the core metabolic machines can persist.

Writing Assessment 01

Module 01 references

  1. Achenbach, J. 2012. Spaceship Earth: A new view of environmentalism. Washington Post. Link.

  2. Budny, JA. 2017. Book Review: Aerobiology—The Toxicology of Airborne Pathogens and Toxins. International Journal of Toxicology. 36:50-51. doi: 10.1177/1091581816678191. Link

  3. Canfield, DE, Glazer, AN, Falkowski, PG. 2010. The Evolution and Future of Earth’s Nitrogen Cycle. Science. 330:192-196. doi: 10.1126/science.1186120. 20929768

  4. Falkowski, PG, Fenchel, T, Delong, EF. 2008. The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science. 320:1034-1039. doi: 10.1126/science.1153213. Link

  5. Kasting, JF, Siefert, JL. 2002. Life and the Evolution of Earth’s Atmosphere. Science. 296:1066-1068. doi: 10.1126/science.1071184. Link

  6. Schrag, DP. 2012. Geobiology of the Anthropocene. Fundamentals of Geobiology. 425-436. Link

  7. Nisbet, EG, Sleep, NH. 2001. The habitat and nature of early life. Nature. 409:1083-1091. doi: 10.1038/35059210. Link

  8. Rockström, J, Steffen, W, Noone, K, Scheffer, M, Teknik- och vetenskapshistoria (bytt namn, 20120201), Skolan för arkitektur och samhällsbyggnad, (ABE), KTH, Filosofi och teknikhistoria. 2009. A safe operating space for humanity. Nature. 461:472-475. doi: 10.1038/461472a. Link

  9. Waters, CN, Zalasiewicz, J, Summerhayes, C, Barnosky, AD, Poirier, C, Galuszka, A, Cearreta, A, Edgeworth, M, Ellis, EC, Ellis, M, Jeandel, C, Leinfelder, R, McNeill, JR, Richter, DD, Steffen, W, Syvitski, J, Vidas, D, Wagreich, M, Williams, M, An, ZS, Grinevald, J, Odada, E, Oreskes, N, Wolfe, AP. 2016. The Anthropocene is functionally and stratigraphically distinct from the Holocene. Science. 351:137. doi: 10.1126/science.aad2622. Link

  10. Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578–6583. PMC33863

Module 02

Module 02 portfolio check

  • Evidence worksheet_04
    • Completion status:
    • Comments: Incomplete –> Now, it’s completed.
  • Problem Set_03
    • Completion status: X
    • Comments:
  • Writing assessment_02
    • CANCELED
  • Additional Readings
    • Completion status:
    • Comments: Need links. –> Links are now added.

Evidence worksheet 04 “Bacterial Rhodopsin Gene Expression”

Martinez, A, Bradley, AS, Waldbauer, JR, Summons, RE, DeLong, EF. 2007. Proteorhodopsin Photosystem Gene Expression Enables Photophosphorylation in a Heterologous Host. Proc. Natl. Acad. Sci. U. S. A. 104:5590-5595. doi: 10.1073/pnas.0611470104.

Learning objectives

  • Discuss the relationship between microbial community structure and metabolic diversity
  • Evaluate common methods for studying the diversity of microbial communities
  • Recognize basic design elements in metagenomic workflows

General questions

  • What were the main questions being asked?

    • What are the specific functions and physiological roles of Proteorhodopsins (PR) derived from marine microbes?
    • What is the physiological basis of light-activated growth stimulation in bacteria containing the PR photosystem?
    • Why is the PR system so ubiquitous?
  • What were the primary methodological approaches used?

    • Functional screening of metagenomic libraries of marine picoplankton large-insert genomic to create fosmid library
    • High-density colony macroarrays to make recombinant clones expressing PR photosystems in vivo
    • Sequencing of clones and genomic analysis i.e. the fosmid clones were sequenced, analyzed and annotated.
    • Carotenoid Extraction
    • HPLC Analysis i.e. in order to identify the pigments
    • Proton-Pumping Experiments
    • ATP Measurements
  • Summarize the main results or findings.

    • 6 genes (five encoding photopigment biosynthetic proteins and one encoding a PR) is needed to have a fully functional PR photosystem
    • The PR can be expressed and be functional without the photopigment
    • Can introdue the PR system via a single horizontal gene transfer, which may explains the high abundance of PR in many organisms
    • Functional screening is useful approach to identify phenotypes without sequence data available
  • Do new questions arise from the results?

    • How prevalent is PR system among marine organisms, is there a percentage?
    • What other potential reasons why many marine organisms have acquired the PR system?
    • Is the PR systems also abundant in other mircoorganisms?
  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

    • Although the authors provided a flow chart, the texts for the method section were difficult to follow. I think it would be helpful if they had provided some rational for each of the major steps in their experimental approach.
    • The figures were very presentable, summarized some of the results and helpful in terms of understanding the paper.
    • The abstract was well-written. I got the whole picture of the paper from first glance.

Problem set 03 “Metagenomics: Genomic Analysis of Microbial Communities”

Wooley, JC, Godzik, A, Friedberg, I. 2010. A primer on metagenomics. PLoS Computational Biology. 6:e1000667. doi: 10.1371/journal.pcbi.1000667.

Learning objectives:

Specific emphasis should be placed on the process used to find the answer. Be as comprehensive as possible e.g. provide URLs for web sources, literature citations, etc.
(Reminders for how to format links, etc in RMarkdown are in the RMarkdown Cheat Sheets)

Specific Questions:

  • How many prokaryotic divisions have been described and how many have no cultured representatives (microbial dark matter)?

    • At least 89 bacterial and 20 archaeal phyla are recognized via small subunit ribosomal RNA databases, although the true phyla count is certainly higher and could range up to 1,500
      • As there are prokaryotes that live in the “shadow biosphere” –> which is a hypothetical microbial biosphere unknown to life
    • 26 of the approximately 52 identifiable major phyla, within the domain Bacteria have cultivated representatives
      • Thus, 52-26 of the major phyla of Bacteria are uncultured
    • Point is most of the life is uncultured. Only information we have about life is from seqeuncing.

    • Specific references for the question above:
      • Solden, L, Lloyd, K, Wrighton, K. 2016. The bright side of microbial dark matter: lessons learned from the uncultivated majority. Curr. Opin. Microbiol. 31:217-226. doi: 10.1016/j.mib.2016.04.020.

      • Youssef, NH, Couger, MB, McCully, AL, Criado, AEG, Elshahed, MS. 2015. Assessing the global phylum level diversity within the bacterial domain: A review. Journal of Advanced Research. 6:269-282. doi: 10.1016/j.jare.2014.10.005.

      • Rappé, MS, Giovannoni, SJ. 2003. THE UNCULTURED MICROBIAL MAJORITY. Annual Reviews in Microbiology. 57:369-394. doi: 10.1146/annurev.micro.57.030502.090759.

  • How many metagenome sequencing projects are currently available in the public domain and what types of environments are they sourced from?
  • Shot-gun metagenomics:
    • Assembly: EULER, IMG -/M
    • Binning: S-GCOM, IMG-RAST
    • Annotation: KEGG, NCBI
    • Analysis: Megan 5
  • Marker Gene Metagenomics:
    • Standalone software:OTU base
    • Analysis: SILVA
    • Denoising: Amplicon Noise
    • Datapases: Ribosomal Database Project (RDP) rences: from a paper, review artile
  • What is the difference between phylogenetic and functional gene anchors and how can they be used in metagenome analysis?

    • Phylogenetic:
      • veritcal gene transfer
      • carry phylogenetic info
      • taxonomic
      • ideally single copy
    • Functional:
      • more horizontal gene transfer
      • identify specific biogeochemical functions associated with measureable effects
      • not as useful for phylogentic construction
  • What is metagenomic sequence binning? What types of algorithmic approaches are used to produce sequence bins? What are some risks and opportunities associated with using sequence bins for metabolic reconstruction of uncultivated microorganisms?
    • Binning: process of grouping sequences that comes from a single genome

    • Types of algorithms:
    1. Align sequences to database
    2. Group to each based on DNA characteristics: GC content, codon usage
    • Risks & Oppurtunities: Risks:
    • incomplete coverage of genome sequences
    • contamination of different phylogeny. Questions to ask what is considered a contamination?
      • Threshold should be be minimum 10% for contamition
    • What is a genome from a metagenome?
  • Is there an alternative to metagenomic shotgun sequencing that can be used to access the metabolic potential of uncultivated microorganisms? What are some risks and opportunities associated with this alternative?

    • functional screens (biochemical, etc)
    • 3rd gene seqeuecing (nanopore)
    • Single sequencing
    • FISH probe

Module 02 references

  1. Madsen, EL. 2005. Opinion: Identifying microorganisms responsible for ecologically significant biogeochemical processes. Nature Reviews Microbiology. 3:439-446. doi: 10.1038/nrmicro1151. Link

  2. Martinez, A, Bradley, AS, Waldbauer, JR, Summons, RE, DeLong, EF. 2007. Proteorhodopsin Photosystem Gene Expression Enables Photophosphorylation in a Heterologous Host. Proc. Natl. Acad. Sci. U. S. A. 104:5590-5595. doi: 10.1073/pnas.0611470104. 17372221

  3. Rappé, MS, Giovannoni, SJ. 2003. The uncultured microbial majority. Annual Reviews in Microbiology. 57:369-394. doi: 10.1146/annurev.micro.57.030502.090759. 14527284

  4. Solden, L, Lloyd, K, Wrighton, K. 2016. The bright side of microbial dark matter: lessons learned from the uncultivated majority. Curr. Opin. Microbiol. 31:217-226. doi: 10.1016/j.mib.2016.04.020. 27196505

  5. Wooley, JC, Godzik, A, Friedberg, I. 2010. A primer on metagenomics. PLoS Computational Biology. 6:e1000667. doi: 10.1371/journal.pcbi.1000667. 20195499

  6. Youssef, NH, Couger, MB, McCully, AL, Criado, AEG, Elshahed, MS. 2015. Assessing the global phylum level diversity within the bacterial domain: A review. Journal of Advanced Research. 6:269-282. doi: 10.1016/j.jare.2014.10.005. 26257925

Module 03

Module 03 portfolio check

  • Evidence worksheet_05
    • Completion status: X
    • Comments:
  • Problem set_04
    • Completion status: X
    • Comments:
  • Writing Assessment_03
    • Completion status:
    • Comments:
  • Additional Readings
    • Completion status:
    • Comments: Need links –> Links are now added.

Evidence worksheet 05 “Extensive mosaic stucture”

Welch, RA, Burland, V, Plunkett, G, Redford, P, Roesch, P, Rasko, D, Buckles, EL, Liou, S-, Boutin, A, Hackett, J, Stroud, D, Mayhew, GF, Rose, DJ, Zhou, S, Schwartz, DC, Perna, NT, H. L. T. Mobley, Donnenberg, MS, Blattner, FR. 2002. Extensive Mosaic Structure Revealed by the Complete Genome Sequence of Uropathogenic Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 99:17020-17024. doi: 10.1073/pnas.252529799.

Learning objectives Part 1

  • Evaluate the concept of microbial species based on environmental surveys and cultivation studies.

  • Explain the relationship between microdiversity, genomic diversity and metabolic potential

  • Comment on the forces mediating divergence and cohesion in natural microbial communities

General questions

  • What were the main questions being asked?

    • How does the genomes of uropathogenic Escherichia coli, strain CFT073, enterohemorrhagic E. coli EDL933, and laboratory strain MG1655 compare to each other? What makes each them distinct from one another?

    • How are these difference relate to their phenotype?

  • What were the primary methodological approaches used?

    • Isolation of CFT073
    • Whole genome library preparation from the isolated genomic DNA of CFT073
    • Sequencing
    • Sequence analysis and annotation via MAGPIE
  • Summarize the main results or findings.

    • The genome of uropathogenic Escherichia coli, strain CFT073 is circular and has 5,231,428-bp. No virulence plasmids were found CFT073 since it not usually associated with uropathogenic strains

    • Codon usage analysis in CFT073 showed that there is a different patterns of usage occur between the backbone and island gene.
    • Between the 3 strains of interest, there is variation in island genes, which is made possible by horizontal gene transfer. As an example from the text, the CFT073-specific islands contain 2,004 genes, of which only 204 also occur among the EDL933-specific genes.
    • This suggested that there are difference in the genomes between pathogenic strains. This holds true when also compared to the benign strain.

    • As quoted from the text, different E. coli pathotypes have maintained a remarkable synteny of common, vertically evolved genes, whereas many islands interrupting this common backbone have been acquired by different horizontal transfer events in each strain.

    • With only 39.2% between the genomes of Escherichia coli strains CFT073, EDL933, strain MG1655, what can be conluded from the study is that a species may have a low percentage genome similarity and be functional different.

  • Do new questions arise from the results?

    • How should we define species?
    • Should we define species now based on its pathogenicity? What would be the cut off? +Should we take a closer consideration of the niche th eeach pathotypes survived when it comes to defining what a species it?

    • Why aren’t virulence plasmid associated with uropathogenic strain CFT073 even though they are common to many E. coli isolates and usually associated with other uropathogenic strains?

    • Can we access the presence of black holes given the large number of genetic differences with today’s technology?

  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

    • I think this paper is lacking a conclusion section which could have helped with better understanding of the bigger picture and implication of this study.
    • Providing a short definition for the jargons like uropathogenic, enterohemorrhagic would have been helpful.
    • Figure 2 provides an advantage in understading the paper because it puts some of the results into a picture, making it easier to visualize.

Learning objectives Part 2

  • Comment on the creative tension between gene loss, duplication and acquisition as it relates to microbial genome evolution
  • Identify common molecular signatures used to infer genomic identity and cohesion
  • Differentiate between mobile elements and different modes of gene transfer
  • Based on your reading and discussion notes, explain the meaning and content of the following figure derived from the comparative genomic analysis of three E. coli genomes by Welch et al. Remember that CFT073 is a uropathogenic strain and that EDL933 is an enterohemorrhagic strain. Explain how this study relates to your understanding of ecotype diversity. Provide a definition of ecotype in the context of the human body. Explain why certain subsets of genes in CFT073 provide adaptive traits under your ecological model and speculate on their mode of vertical descent or gene transfer.

    • The figure is a comparison between the locations and sizes CFT073 and EDL933 island. The Island size is on the vertical axis; position in colinear backbone is on the horizontal axis.

    • An ecotype describes a genetically distinct entity within a species which is genotypically adapted to specific environmental conditions. These different ecotypes or different strains of E. coli exhibit phenotypic differences. In the context

    • For uropathogenic strains of E. coli, island acquisition resulted in the capability to infect the urinary tract and bloodstream and evade host defenses without compromising the ability to harmlessly colonize the intestine.

    • For the different intestinal pathogens, acquired genes promote the colonization of specific regions of the intestine and new modes of interaction with the host tissue that produce clinically distinct variations of gastrointestinal disease

    • What makes them the same species or what makes them all E. coli are due to vertical transfer of ancestral backbone genes. The new genes found in different E coli strain are acquired via numerous, independent horizontal gene-transfer.

Problem set_04 “Counting Candy Microbes”

Kunin, V, Engelbrektson, A, Ochman, H, Hugenholtz, P. 2010. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ. Microbiol. 12:118-123. doi: 10.1111/j.1462-2920.2009.02051.x.

Sogin, ML, Morrison, HG, Huber, JA, Welch, DM, Huse, SM, Neal, PR, Arrieta, JM, Herndl, GJ. 2006. Microbial Diversity in the Deep Sea and the Underexplored “Rare Biosphere”. Proc. Natl. Acad. Sci. U. S. A. 103:12115-12120. doi: 10.1073/pnas.0605127103.

Learning objectives:

  • Gain experience estimating diversity within a hypothetical microbial community

Outline:

In class Day 1:

  1. Define and describe species within your group’s “microbial” community.
  2. Count and record individuals within your defined species groups.
  3. Remix all species together to reform the original community.
  4. Each person in your group takes a random sample of the community (i.e. devide up the candy).

Assignment:

  1. Individually, complete a collection curve for your sample.
  2. Calculate alpha-diversity based on your original total community and your individual sample.

In class Day 2:

  1. Compare diversity between groups.

Part 1: Description and enumeration

Obtain a collection of “microbial” cells from “seawater”. The cells were concentrated from different depth intervals by a marine microbiologist travelling along the Line-P transect in the northeast subarctic Pacific Ocean off the coast of Vancouver Island British Columbia.

Sort out and identify different microbial “species” based on shared properties or traits. Record your data in this Rmarkdown using the example data as a guide.

Once you have defined your binning criteria, separate the cells using the sampling bags provided. These operational taxonomic units (OTUs) will be considered separate “species”. This problem set is based on content available at What is Biodiversity.

For example, load in the packages you will use.

#To make tables
library(knitr)
#To manipulate and plot data
library(tidyverse)
library(kableExtra)

Then load in the data. You should use a similar format to record your community data.

example_data1 = data.frame(
  number = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15),
  name = c("M_n_Ms", "Kisses", "Skittles","Spheres", "Jolly_Ranchers", "Wine_gummies", "Octupi_gummies","Swirl_gummies", "Cherry_gummies", "Watermelon_gummies","Cola_gummies", "Classic_bear_gummies", "Sugar_coated_bear_gummies", "String", "Lego"),
  characteristics = c("chocolate inside; 6 different colours", "chocolate inside; silver", "sugar inside ; 5 different colour","sugar inside; 3 different colours", "sugar inside; 5 different colour; elongated", "gummy; 2 different colours; matte", "gummy; pink and yellow; sugar coated; 7 legs","gummy; 2 different colours; sugar coated", "gummy; cherry-shaped; sugar coated", "gummy; watermelon-shaped; sugar coated","gummy; soda-shaped; sugar coated", "gummy; bear-shaped", "Sugar-gummy; bear-shaped; sugar coated", "gummy; red; string", "sugar inside; 3 different colour; lego-shaped, hard"),
  occurences = c(52,1,39,6,38,2,0,0,1,0,1,23,1,1,5)
)
example_data2 = data.frame(
  number = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15),
  name = c("M_n_Ms", "Kisses", "Skittles","Spheres", "Jolly_Ranchers", "Wine_gummies", "Octupi_gummies","Swirl_gummies", "Cherry_gummies", "Watermelon_gummies","Cola_gummies", "Classic_bear_gummies", "Sugar_coated_bear_gummies", "String", "Lego"),
  characteristics = c("chocolate inside; 6 different colours", "chocolate inside; silver", "sugar inside ; 5 different colour","sugar inside; 3 different colours", "sugar inside; 5 different colour; elongated", "gummy; 2 different colours; matte", "gummy; pink and yellow; sugar coated; 7 legs","gummy; 2 different colours; sugar coated", "gummy; cherry-shaped; sugar coated", "gummy; watermelon-shaped; sugar coated","gummy; soda-shaped; sugar coated", "gummy; bear-shaped", "Sugar-gummy; bear-shaped; sugar coated", "gummy; red; string", "sugar inside; 3 different colour; lego-shaped, hard"),
  occurences = c(214,16,197,19,131,6,6,3,1,1,3,101,3,14,17)
)

Finally, use these data to create a table.

example_data1 %>% 
  kable("html") %>%
  kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
number name characteristics occurences
1 M_n_Ms chocolate inside; 6 different colours 52
2 Kisses chocolate inside; silver 1
3 Skittles sugar inside ; 5 different colour 39
4 Spheres sugar inside; 3 different colours 6
5 Jolly_Ranchers sugar inside; 5 different colour; elongated 38
6 Wine_gummies gummy; 2 different colours; matte 2
7 Octupi_gummies gummy; pink and yellow; sugar coated; 7 legs 0
8 Swirl_gummies gummy; 2 different colours; sugar coated 0
9 Cherry_gummies gummy; cherry-shaped; sugar coated 1
10 Watermelon_gummies gummy; watermelon-shaped; sugar coated 0
11 Cola_gummies gummy; soda-shaped; sugar coated 1
12 Classic_bear_gummies gummy; bear-shaped 23
13 Sugar_coated_bear_gummies Sugar-gummy; bear-shaped; sugar coated 1
14 String gummy; red; string 1
15 Lego sugar inside; 3 different colour; lego-shaped, hard 5
example_data2 %>% 
  kable("html") %>%
  kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
number name characteristics occurences
1 M_n_Ms chocolate inside; 6 different colours 214
2 Kisses chocolate inside; silver 16
3 Skittles sugar inside ; 5 different colour 197
4 Spheres sugar inside; 3 different colours 19
5 Jolly_Ranchers sugar inside; 5 different colour; elongated 131
6 Wine_gummies gummy; 2 different colours; matte 6
7 Octupi_gummies gummy; pink and yellow; sugar coated; 7 legs 6
8 Swirl_gummies gummy; 2 different colours; sugar coated 3
9 Cherry_gummies gummy; cherry-shaped; sugar coated 1
10 Watermelon_gummies gummy; watermelon-shaped; sugar coated 1
11 Cola_gummies gummy; soda-shaped; sugar coated 3
12 Classic_bear_gummies gummy; bear-shaped 101
13 Sugar_coated_bear_gummies Sugar-gummy; bear-shaped; sugar coated 3
14 String gummy; red; string 14
15 Lego sugar inside; 3 different colour; lego-shaped, hard 17

For your community:

  • Construct a table listing each species, its distinguishing characteristics, the name you have given it, and the number of occurrences of the species in the collection.
  • Ask yourself if your collection of microbial cells from seawater represents the actual diversity of microorganisms inhabiting waters along the Line-P transect. Were the majority of different species sampled or were many missed?

Part 2: Collector’s curve

To help answer the questions raised in Part 1, you will conduct a simple but informative analysis that is a standard practice in biodiversity surveys. This analysis involves constructing a collector’s curve that plots the cumulative number of species observed along the y-axis and the cumulative number of individuals classified along the x-axis. This curve is an increasing function with a slope that will decrease as more individuals are classified and as fewer species remain to be identified. If sampling stops while the curve is still rapidly increasing then this indicates that sampling is incomplete and many species remain undetected. Alternatively, if the slope of the curve reaches zero (flattens out), sampling is likely more than adequate.

To construct the curve for your samples, choose a cell within the collection at random. This will be your first data point, such that X = 1 and Y = 1. Next, move consistently in any direction to a new cell and record whether it is different from the first. In this step X = 2, but Y may remain 1 or change to 2 if the individual represents a new species. Repeat this process until you have proceeded through all cells in your collection.

For example, we load in these data.

example_data3 = data.frame(
  x = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170),
  y = c(1,1,1,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,7,8,8,8,8,8,8,8,9,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12)
)

And then create a plot. We will use a scatterplot (geom_point) to plot the raw data and then add a smoother to see the overall trend of the data.

ggplot(example_data3, aes(x=x, y=y)) +
  geom_point() +
  geom_smooth() +
  labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
## `geom_smooth()` using method = 'loess'

For your sample:

  • Create a collector’s curve for your sample (not the entire original community).
    • See graph above.
  • Does the curve flatten out? If so, after how many individual cells have been collected?
    • Yes, it flattens out with 85 individual cells collected.
  • What can you conclude from the shape of your collector’s curve as to your depth of sampling?
    • All 12 of species have been recorded after 85 candies had been collected. Thus, it indicates that collection of 85 candies may capture 100% of the species in this particular sample.

Part 3: Diversity estimates (alpha diversity)

Using the table from Part 1, calculate species diversity using the following indices or metrics.

Diversity: Simpson Reciprocal Index

\(\frac{1}{D}\) where \(D = \sum p_i^2\)

\(p_i\) = the fractional abundance of the \(i^{th}\) species

For example, using the example data 1 with 3 species with 2, 4, and 1 individuals each, D =

M_n_Ms = 52 /(170)
Kisses = 1 /(170)
Skittles = 39 /(170)
Spheres = 6 /(170)
Jolly_Ranchers = 38 /(170)
Wine_gummies = 2 /(170)
Octupi_gummies = 0 /(170)
Swirl_gummies = 0 /(170)
Cherry_gummies = 1 /(70)
Watermelon_gummies = 0 /(170)
Cola_gummies = 1 /(170)
Classic_bear_gummies = 23 /(170)
Sugar_coated_bear_gummies   = 1 /(170)
String = 1 /(170)
Lego= 5 /(170)
  
1 / (M_n_Ms^2 + Kisses^2 + Skittles^2 + Spheres^2 + Jolly_Ranchers^2 + Wine_gummies^2 + Octupi_gummies^2 + Swirl_gummies^2 + Cherry_gummies^2 + Watermelon_gummies^2 + Cola_gummies^2 + Classic_bear_gummies^2 + Sugar_coated_bear_gummies^2 + String^2 + Lego^2)
## [1] 4.607121
M_n_Ms = 214 /(730)
Kisses = 16 /(730)
Skittles = 197 /(730)
Spheres = 19 /(730)
Jolly_Ranchers = 131 /(730)
Wine_gummies = 6 /(730)
Octupi_gummies = 6 /(730)
Swirl_gummies = 3 /(730)
Cherry_gummies = 1 /(730)
Watermelon_gummies = 1 /(730)
Cola_gummies = 3 /(730)
Classic_bear_gummies = 101 /(730)
Sugar_coated_bear_gummies   = 3 /(730)
String = 14 /(730)
Lego= 17 /(730)
  
1 / (M_n_Ms^2 + Kisses^2 + Skittles^2 + Spheres^2 + Jolly_Ranchers^2 + Wine_gummies^2 + Octupi_gummies^2 + Swirl_gummies^2 + Cherry_gummies^2 + Watermelon_gummies^2 + Cola_gummies^2 + Classic_bear_gummies^2 + Sugar_coated_bear_gummies^2 + String^2 + Lego^2)
## [1] 4.708845

The higher the value is, the greater the diversity. The maximum value is the number of species in the sample, which occurs when all species contain an equal number of individuals. Because the index reflects the number of species present (richness) and the relative proportions of each species with a community (evenness), this metric is a diveristy metric. Consider that a community can have the same number of species (equal richness) but manifest a skewed distribution in the proportion of each species (unequal evenness), which would result in different diveristy values.

  • What is the Simpson Reciprocal Index for your sample? 4.607121
  • What is the Simpson Reciprocal Index for your original total community? 4.708845
Richness: Chao1 richness estimator

Another way to calculate diversity is to estimate the number of species that are present in a sample based on the empirical data to give an upper boundary of the richness of a sample. Here, we use the Chao1 richness estimator.

\(S_{chao1} = S_{obs} + \frac{a^2}{2b})\)

\(S_{obs}\) = total number of species observed a = species observed once b = species observed twice or more

So for our previous example community of 3 species with 2, 4, and 1 individuals each, \(S_{chao1}\) =

12 + 5^2/(7*2)
## [1] 13.78571
15 + 2^2/(13*2)
## [1] 15.15385
  • What is the chao1 estimate for your sample? 13.78571
  • What is the chao1 estimate for your original total community? 15.15385

Part 4: Alpha-diversity functions in R

We’ve been doing the above calculations by hand, which is a very good exercise to aid in understanding the math behind these estimates. Not surprisingly, these same calculations can be done with R functions. Since we just have a species table, we will use the vegan package. You will need to install this package if you have not done so previously.

library(vegan)

First, we must remove the unnecesary data columns and transpose the data so that vegan reads it as a species table with species as columns and rows as samples (of which you only have 1).

example_data1_diversity = 
  example_data1 %>% 
  select(name, occurences) %>% 
  spread(name, occurences)

example_data1_diversity
example_data2_diversity = 
  example_data2 %>% 
  select(name, occurences) %>% 
  spread(name, occurences)

example_data2_diversity

Then we can calculate the Simpson Reciprocal Index using the diversity function.

diversity(example_data1_diversity, index="invsimpson")
## [1] 4.610721
diversity(example_data2_diversity, index="invsimpson")
## [1] 4.734682

And we can calculate the Chao1 richness estimator (and others by default) with the the specpool function for extrapolated species richness. This function rounds to the nearest whole number so the value will be slightly different that what you’ve calculated above.

specpool(example_data1_diversity)
specpool(example_data2_diversity)
diversity(example_data1_diversity, index="shannon")
## [1] 1.730667
diversity(example_data2_diversity, index="shannon")
## [1] 1.799751
  • What is the alpha diversity of your sample? 1.730667
  • What is the alpha diversity of your original total community? 1.799751

In Project 1, you will also see functions for calculating alpha-diversity in the phyloseq package since we will be working with data in that form.

For your sample:

  • What are the Simpson Reciprocal Indices for your sample and community using the R function?

    • Sample: 4.610721
    • Community: 4.734682
  • What are the chao1 estimates for your sample and community using the R function?
    • Sample: 12
    • Community: 15

    • These values match your previous calculations.

Part 5: Concluding activity

If you are stuck on some of these final questions, reading the Kunin et al. 2010 and Lundin et al. 2012 papers may provide helpful insights.

  • How does the measure of diversity depend on the definition of species in your samples?
    • If species are defined with less stringently, one would expect a lower diversity measure. For example, in our case with the candy experiment, if each “species” were classfied into chocolates, gummies and hard candies, having 3 differernt groups of species will result in less diversity. On the other hand, if defined your species more rigidly (based on colour, shape and composition), you would have more different species, and thereby a higher divesity measure wouldbe observed.
  • Can you think of alternative ways to cluster or bin your data that might change the observed number of species?

    • We could cluster the data based on colour AND the composition. For example, a red skittle candy would be a different species from a blue skittle candy.
  • How might different sequencing technologies influence observed diversity in a sample?

    • Sequencing errors could alter observed diversity if sequences were misclassified.
    • The database used may be biased in classfiying the species as the same or different
    • Having no controls and improper filtering of secquence data.

Writing Assessment 03

Module 03 references

  1. Callahan, BJ, McMurdie, PJ, Holmes, SP. 2017. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME Journal. 11:2639. Link

  2. Gaudet, AD, Ramer, LM, Nakonechny, J, Cragg, JJ, Ramer, MS. 2010. Small-group learning in an upper-level university biology class enhances academic performance and student attitudes toward group work. PloS One. 5:e15821. doi: 10.1371/journal.pone.0015821. Link

  3. Hallam, SJ, Torres-Beltrán M, Hawley, AK. 2017. Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone. Scientific Data. 4:. Link

  4. Hawley, AK, Torres-Beltrán M, Zaikova, E, Walsh, DA, Mueller, A, Scofield, M, Kheirandish, S, Payne, C, Pakhomova, L, Bhatia, M. 2017. A compendium of multi-omic sequence information from the Saanich Inlet water column. Scientific Data. 4:170160. Link

  5. Kunin, V, Engelbrektson, A, Ochman, H, Hugenholtz, P. 2010. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ. Microbiol. 12:118-123. doi: 10.1111/j.1462-2920.2009.02051.x. 19725865

  6. Sogin, ML, Morrison, HG, Huber, JA, Welch, DM, Huse, SM, Neal, PR, Arrieta, JM, Herndl, GJ. 2006. Microbial Diversity in the Deep Sea and the Underexplored “Rare Biosphere”. Proc. Natl. Acad. Sci. U. S. A. 103:12115-12120. doi: 10.1073/pnas.0605127103. 16880384

  7. Torres-Beltrán, M, Hawley, AK, Capelle, D, Zaikova, E, Walsh, DA, Mueller, A, Scofield, M, Payne, C, Pakhomova, L, Kheirandish, S. 2017. A compendium of geochemical information from the Saanich Inlet water column. Scientific Data. 4:170159. Link

  8. Welch, RA, Burland, V, Plunkett, G, Redford, P, Roesch, P, Rasko, D, Buckles, EL, Liou, S-, Boutin, A, Hackett, J, Stroud, D, Mayhew, GF, Rose, DJ, Zhou, S, Schwartz, DC, Perna, NT, H. L. T. Mobley, Donnenberg, MS, Blattner, FR. 2002. Extensive Mosaic Structure Revealed by the Complete Genome Sequence of Uropathogenic Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 99:17020-17024. doi: 10.1073/pnas.252529799. Link

Project 1

  • CATME account setup and survey
    • Completion status: X
    • Comments:
  • CATME interim group assessment
    • Completion status: X
    • Comments:
  • Project 1
    • Report (80%):
    • Participation (20%):

Module 04 Portfolio Content

Project 2

  • CATME final group assessment
    • Completion status:
    • Comments:
  • Project 2
    • Report (80%):
    • Participation (20%):